Supplementary: State Space Methods for Phase Amplitude Coupling

Supplementary Section .1 Derivations Power Spectral Density for the State Space Oscillator Process In this section, we derive the parametric expression for the power spectral density (PSD) of an oscillation x j 1,t by building an autoregressive moving average process (ARMA) with the same spectral content. For convenience, we will drop the index j in what follows. First, we note that an oscillation is asymptotically second order stationary. Let us compute its autocovariance sequence. Since R is a rotation matrix R(ω)k = R(kω) and RR⊺ = I. Therefore, from equation (1), it comes:


Power Spectral Density for the State Space Oscillator Process
In this section, we derive the parametric expression for the power spectral density (PSD) of an oscillation x j 1,t by building an autoregressive moving average process (ARMA) with the same spectral content. For convenience, we will drop the index j in what follows. First, we note that an oscillation is asymptotically second order stationary. Let us compute its autocovariance sequence. Since R is a rotation matrix R(ω) k = R(kω) and RR ⊺ = I. Therefore, from equation (1), it comes: and: We can hence write, for t = 1..N, k = 0..N − t, s t,t+k = E(x 1,t+k x 1,t ⊺ ) = s k . As a consequence, an oscillation can be approximated by a second order stationary process, and in virtue of the Wiener-Khinchin theorem 38 , its theoretical power spectral density is: We now consider the ARMA(2,1): x t = φ 1xt−1 + φ 2xt−2 +ũ t + ψ 1ũt−1 ,ũ t ∼ N (0,σ 2 ) to which we impose, for t = 1..N, k = 0..N − t: E(x tx ⊺ t+k ) = s k . It follows that: s k =φ 1 s k−1 + φ 2 s k−2 , k > 2 Taking: satisfies the first equality of equation (38). The remaining conditions can then be rewritten: ψ 1σ 2 = −aσ 2 cos(ω) 0 = ψ 2 1 + 1 + a 2 a cos(ω) ψ 1 + 1 1/ 13 from which we deduce 2 two negative roots ψ ± 1 ultimately leading to the same autocovariance series. Overall, we choose: Applying the discrete Fourier transform to equation (37) yields: sinceũ t is Gaussian noise, E(ũ tũt+k ⊺ ) = δ kσ 2 . Therefore, our ARMA(2,1) PSD is: Finally, the PSD of an oscillation centered in f 0 is ;

An Expectation-Maximization (EM) Algorithm for Independent and Harmonic Oscillation Decompositions
Since all noise terms are assumed to be additive Gaussian, the complete data log likelihood for one time window of length N is: We wish to maximize log L with respect to Θ = (Φ, Q, R) but we do not have access to the hidden oscillations x t . We use an expectation maximization algorithm to alternatively and iteratively estimate (E-Step) and maximize (M-Step) the log likelihood. At iteration r, we use the Kalman filter to estimate x t given a set Θ r which gives us access to: Then, we deduce Θ r+1 :

Kalman Filter and Fixed Interval Smoother Estimates
We use the Kalman filter to estimate the hidden oscillations given the observations and the model parameters. They first predict the state at the next time point, then compare that prediction to the observation, and finally produce an updated estimate based on the most recently observed data. Given the full observation time series, we can apply backward smoothing to refine the update to account for the full observation series (i.e., fixed interval smoothing).

2/ 13
and compute those quantities using the forward smoothing algorithm: and a set of backward recursions 50 . For t = N..1, and t 1 < t 2 : Backward Gain : Smoothing :

E-Step
Here we describe G r (Θ) for the state space oscillation decomposition with harmonic components. An single oscillation is defined by a rotation matrix R(ω), a scaling parameter a and a process noise covariance matrix Q = σ 2 I 2×2 . In what follows, we consider d independent oscillations, which are respectively associated to h 1 , ..., h d harmonics. For j = 1..d, an oscillation with fundamental frequency ω j is the sum of h = 1...h j , harmonics respectively defined by R(hω j ), a j,h and Q j,h = σ 2 j,h I 2×2 . We note D = ∑ d j=1 h j the total number of oscillatory components.
For V ∈ R 2D×2D , j = 1..d and h = 1..h j , we note V j,h the 2 by 2 diagonal block associated with the h th j harmonic of oscillation j. Φ and Q are block diagonal matrices whose diagonal blocks are a j,h R(h j,h ) and Q j,h : Additionally, we will use M = [1 0 1 0 ... 0] ∈ R 2D and for U ∈ R 2×2 , we note: rt(U) := U 21 − U 12 and tr(U) = U 11 +U 22 .
Taking the conditional expectation of the log likelihood log L at iteration r for a fixed set of parameter Θ r = (Φ, Q, R) r , we obtain 23 : where:

M-Step
We can maximize G r with respect to R and (Φ, Q) independently. We have: Since Q is (block) diagonal, we can write: A is symmetric and Φ is a block diagonal matrix whose element are 2 × 2 rotation matrices, we develop equation (53) and obtain: Differentiating with respect to process noises covariances σ 2 j,h , scaling parameter a j,h and fundamental frequencies ω j yields: We inject the upper equations of (58) into the third one and we note: Using trigonometric identities, equations (58) can finally be rewritten: Overall, for j = 1..d, if h j > 1, we numerically solve for ω j using equation (60) and deduce a j,h and σ 2 j,h for h = 1..h j .
If h j = 1, we immediately have:

4/ 13
Linear Regression: priors, hyperparameters, and normalizing constants As in 47 , we use τ β = σ −2 β and we assume that the likelihood of the model defined in equation (21) is: The conjugate prior is: We choose prior hyperparametersṼ ,bν andβ = [β 0β1β2 ] ⊺ to convey as little information as possible on the phase and the strength of the modulation. Marginalizing equation (63) over τ β , yields a truncated multivariate t-distribution: ν ≥ 3 insures that the multivatiate-t variance is defined. It is:ν(ν − 2) −1bṼ −1 . Then, we consider the independent random variablesÃ,K andφ such that: We note γ = ÃÃK cos(φ )ÃK sin(φ ) ⊺ , useb = 1,ν = 3 and we defineβ , andṼ such that: and since E s ( cos(φ s t + φ mod ) t ) = 0 (where E s represents an average over trials or windows and ⟨.⟩ t is a temporal average across a given window), Posterior parameters are then given by: Where:

5/ 13
We deduce the posterior distribution: The normalizing constant Z is obtained by integration and is: where P(β ∈ W (K)) is computed using the multivariate t-distribution of parameters ν, β and b −1 V .
Finally, we deduce equation (24) by marginilizing equation (70) over τ β and have: Note that for large samples it might be useful to use:

Second State-Space
In this section, for a given AR order p, we estimate the parameters R β , Q β and {h k } p k=1 defined equation (6). Let C m = E(β SSP T β SSP T −m ⊺ ) be the autocovariance sequence of the modulation vectors estimated with equation (25). We have: where δ i, j is the the Kronecker delta. Equation (74) can be rewritten: For an observation noise candidate R * β , if we can invert equation (75), we immediately access Q * β and {h * k } p k=1 . Using the Kalman Filter, we hence deduce the likelihood of the candidate model as in 22 .
Therefore, we note R m β the smallest eigenvalue of the Toeplitz matrix C = (C |i− j| ) i, j=0..p , and, numerically optimize the model likelihood with respect to R * β in (0, R m β ), where we know that (C − IR * β ) is full rank.
From R β we get Q β and {h k } p k=1 then, once again, we use the Kalman Filter to estimate β dSSP T .

Modulogram Equivalence
We derive an approximated parametric modulogram for a window of length δt = N/F s centered in τ. We will use: For clarity we will use t or k without distinction and we remind that: Additionally, we assume that: • and, for simplicity, for ψ ∈ [−π, π], for all h : We hence have: sin(ω s /(2F s )) ≤ 2F s ω s . Therefore: On the other hand: But ∑ k∈Ω τ,ψ ε k = O p (δ ψ √ l) and l ∝ N so:

7/ 13
Using the same argument as the one detailed above we get: Additionally: Where γ is a function such that γ(x) −−→ x→0 0. Since: For ω s F s "sufficiently small", we can write: Finally:

Supplementary Section .2 Additional Results
In this section we present additional results to support the validation and comparison of our algorithm: Δf gen = 1Hz Δf gen = 2Hz Δf gen = 3Hz Figure S.6. Modulation phase φ mod estimation and comparison between standard methods (black), DAR (pink) and SSP (blue). 400 windows of 2 seconds were generated with: a slow oscillation (filtered from white noise around f s = 3Hz with ∆ f gen s and normalized to standard deviation σ s ) and a modulated fast oscillation (φ mod = −π/3, modeled with a sinusoid f s = 50Hz and normalized to σ f ). We added unit normalized Gaussian noise and we used 3 Signal To Noise Ratio (SNR) conditions ((σ s , σ s ) = (2, 1.5), (1, 0.6) and (0.7, 0.3)). We show typical signal traces for these different conditions in Fig S.  and normalized to standard deviation σ s ) and a modulated fast oscillation (φ mod = −π/3, modeled with a sinusoid f s = 10Hz and normalized to σ f ). We added unit normalized Gaussian noise and we used 3 Signal To Noise Ratio (SNR) conditions ((σ s , σ s ) = (2, 1.5), (1, 0.6) and (0.7, 0.3)). We report typical signal traces for the different conditions (top), slow oscillation recovery alongside the true slow frequency PSD (middle), and fast frequency recovery (bottom). The red arrow indicates the true multitaper PSD (TW=4, K=5 tapers) peak for each oscillation. and normalized to standard deviation σ s ) and a modulated fast oscillation (φ mod = −π/3, modeled with a sinusoid f s = 50Hz and normalized to σ f ). We added unit normalized Gaussian noise and we used 3 Signal To Noise Ratio (SNR) conditions ((σ s , σ s ) = (2, 1.5), (1, 0.6) and (0.7, 0.3)). We report typical signal traces for the different conditions (top), slow oscillation recovery alongside the true slow frequency PSD (middle), and fast frequency recovery (bottom). The red arrow indicates the true multitaper PSD (TW=4, K=5 tapers) peak for each oscillation.